{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe03b687-98ba-42de-aec6-29e5e8c9313d",
   "metadata": {},
   "source": [
    "Chapter 09\n",
    "\n",
    "# 施密特正交化\n",
    "《线性代数》 | 鸢尾花书：数学不难"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ff0ad23-e015-4068-b2a8-7fe8f9f12f54",
   "metadata": {},
   "source": [
    "这段代码实现的是三维向量的 **格拉姆-施密特正交化（Gram-Schmidt Orthogonalization）**，目的是将矩阵 $A$ 的列向量 $\\boldsymbol{a}_1, \\boldsymbol{a}_2, \\boldsymbol{a}_3$ 转换为一组单位正交基 $\\boldsymbol{q}_1, \\boldsymbol{q}_2, \\boldsymbol{q}_3$，使得 $Q = [\\boldsymbol{q}_1\\ \\boldsymbol{q}_2\\ \\boldsymbol{q}_3]$ 满足 $Q^\\top Q = I$，即每个向量长度为1且两两正交。\n",
    "\n",
    "---\n",
    "\n",
    "### 初始化与列向量提取\n",
    "\n",
    "首先定义一个矩阵  \n",
    "$$\n",
    "A = \\begin{bmatrix}\n",
    "2 & 2 & 0 \\\\\n",
    "2 & 0 & 2 \\\\\n",
    "0 & 2 & 2\n",
    "\\end{bmatrix}\n",
    "$$  \n",
    "随后提取其三列，分别命名为 $\\boldsymbol{a}_1, \\boldsymbol{a}_2, \\boldsymbol{a}_3$，用于后续正交化处理。\n",
    "\n",
    "---\n",
    "\n",
    "### 第一步：构造 $\\boldsymbol{q}_1$\n",
    "\n",
    "$$\n",
    "\\boldsymbol{q}_1 = \\frac{\\boldsymbol{a}_1}{\\|\\boldsymbol{a}_1\\|}\n",
    "$$\n",
    "\n",
    "这一步将 $\\boldsymbol{a}_1$ 单位化，得到 $\\boldsymbol{q}_1$。此时 $\\boldsymbol{q}_1$ 是一个单位向量，方向与 $\\boldsymbol{a}_1$ 相同。验证 $\\boldsymbol{q}_1^\\top \\boldsymbol{q}_1 = 1$ 说明其已标准化。\n",
    "\n",
    "---\n",
    "\n",
    "### 第二步：构造 $\\boldsymbol{q}_2$\n",
    "\n",
    "将 $\\boldsymbol{a}_2$ 在 $\\boldsymbol{q}_1$ 方向上投影并减去：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{\\eta}_2 = \\boldsymbol{a}_2 - (\\boldsymbol{q}_1 \\boldsymbol{q}_1^\\top)\\boldsymbol{a}_2\n",
    "$$\n",
    "\n",
    "其中，$\\boldsymbol{q}_1 \\boldsymbol{q}_1^\\top$ 是一个**投影矩阵**，将任意向量投影到 $\\boldsymbol{q}_1$ 所张成的一维子空间上。\n",
    "\n",
    "然后将残差向量单位化：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{q}_2 = \\frac{\\boldsymbol{\\eta}_2}{\\|\\boldsymbol{\\eta}_2\\|}\n",
    "$$\n",
    "\n",
    "这样得到的 $\\boldsymbol{q}_2$ 与 $\\boldsymbol{q}_1$ 正交，且是单位向量。验证 $\\boldsymbol{q}_2^\\top \\boldsymbol{q}_2 = 1$ 和 $\\boldsymbol{q}_1^\\top \\boldsymbol{q}_2 = 0$，分别说明单位化和正交性成立。\n",
    "\n",
    "---\n",
    "\n",
    "### 第三步：构造 $\\boldsymbol{q}_3$\n",
    "\n",
    "从 $\\boldsymbol{a}_3$ 中减去它在 $\\boldsymbol{q}_1$ 和 $\\boldsymbol{q}_2$ 上的投影：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{\\eta}_3 = \\boldsymbol{a}_3 - (\\boldsymbol{q}_1 \\boldsymbol{q}_1^\\top)\\boldsymbol{a}_3 - (\\boldsymbol{q}_2 \\boldsymbol{q}_2^\\top)\\boldsymbol{a}_3\n",
    "$$\n",
    "\n",
    "然后单位化：\n",
    "\n",
    "$$\n",
    "\\boldsymbol{q}_3 = \\frac{\\boldsymbol{\\eta}_3}{\\|\\boldsymbol{\\eta}_3\\|}\n",
    "$$\n",
    "\n",
    "此时 $\\boldsymbol{q}_3$ 与前两个向量都正交且单位长度。验证 $\\boldsymbol{q}_1^\\top \\boldsymbol{q}_3 = 0$，$\\boldsymbol{q}_2^\\top \\boldsymbol{q}_3 = 0$ 即可确认三向量两两正交。\n",
    "\n",
    "---\n",
    "\n",
    "### 向量投影补充\n",
    "\n",
    "代码中也单独计算了 $\\boldsymbol{a}_3$ 在 $\\boldsymbol{q}_1$ 与 $\\boldsymbol{q}_2$ 上的投影：\n",
    "\n",
    "- 在 $\\boldsymbol{q}_1$ 上的投影为：\n",
    "  $$\n",
    "  \\text{proj}_{\\boldsymbol{q}_1}(\\boldsymbol{a}_3) = \\boldsymbol{q}_1 \\boldsymbol{q}_1^\\top \\boldsymbol{a}_3\n",
    "  $$\n",
    "\n",
    "- 在 $\\boldsymbol{q}_2$ 上的投影为：\n",
    "  $$\n",
    "  \\text{proj}_{\\boldsymbol{q}_2}(\\boldsymbol{a}_3) = \\boldsymbol{q}_2 \\boldsymbol{q}_2^\\top \\boldsymbol{a}_3\n",
    "  $$\n",
    "\n",
    "合并起来，$\\boldsymbol{a}_3$ 在子空间 $\\text{span}(\\boldsymbol{q}_1, \\boldsymbol{q}_2)$ 上的投影是两者的和。\n",
    "\n",
    "---\n",
    "\n",
    "### 总结\n",
    "\n",
    "最终通过格拉姆-施密特过程，将原始矩阵 $A$ 的三列向量正交化，得到三列单位正交向量 $[\\boldsymbol{q}_1\\ \\boldsymbol{q}_2\\ \\boldsymbol{q}_3]$，它们组成一个正交矩阵 $Q$，满足：\n",
    "\n",
    "$$\n",
    "Q^\\top Q = I,\\quad Q Q^\\top = I\n",
    "$$\n",
    "\n",
    "这组向量构成了 $\\mathbb{R}^3$ 中的一个正交基，适合用于QR分解、正交投影、最小二乘等计算。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cc1a9df-7ea3-4c32-a28a-0e06a3560620",
   "metadata": {},
   "source": [
    "## 初始化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "71ef2603-2e3a-4bde-ade4-0583a0434426",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f6d5009-14b1-4928-a3c7-f5562c5a69e4",
   "metadata": {},
   "source": [
    "## 创建数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "11326ba4-69fa-4288-b0a1-2606193ddd88",
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.array([\n",
    "    [2, 2, 0],\n",
    "    [2, 0, 2],\n",
    "    [0, 2, 2]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "aa3a630d-98a0-49e7-a1d7-e117db822030",
   "metadata": {},
   "outputs": [],
   "source": [
    "a1 = A[:,[0]]\n",
    "a2 = A[:,[1]]\n",
    "a3 = A[:,[2]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bf85330-dfcd-4961-9cfa-26b42503be74",
   "metadata": {},
   "source": [
    "## 第一步"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "aa73d0cd-32fa-4dd1-a8cd-7dce14bb53f5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0.70710678],\n",
       "       [0.70710678],\n",
       "       [0.        ]])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q1 = a1/np.linalg.norm(a1)\n",
    "q1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9759a921-c7b6-487c-91d6-200f5b29e5a8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证，单位化\n",
    "q1.T @ q1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c055fe8-7330-448d-91f5-02cac2e3233f",
   "metadata": {},
   "source": [
    "## 第二步"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "947c9b24-ac7c-4623-bc00-ed47dd8c54f7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.5, -0.5,  0. ],\n",
       "       [-0.5,  0.5,  0. ],\n",
       "       [ 0. ,  0. ,  1. ]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.eye(3) - q1 @ q1.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "80e052c9-4d1b-4b93-871b-2361f9561005",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.],\n",
       "       [-1.],\n",
       "       [ 2.]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eta_2 = a2 - q1 @ q1.T @ a2\n",
    "eta_2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "04214e3b-86a3-4b5f-8c55-36f4895c52e7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.40824829],\n",
       "       [-0.40824829],\n",
       "       [ 0.81649658]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q2 = eta_2/np.linalg.norm(eta_2)\n",
    "q2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "70cb9a74-32af-46e6-b573-bde1adbfd810",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.],\n",
       "       [1.],\n",
       "       [0.]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a2_proj_q1 = q1 @ q1.T @ a2\n",
    "a2_proj_q1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ee44d70b-6057-4503-bc68-1030202a8bb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.]])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证，单位化\n",
    "q2.T @ q2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "02318c96-063b-4f85-b6f2-f5948dd843b8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.33333333, -0.33333333, -0.33333333],\n",
       "       [-0.33333333,  0.33333333,  0.33333333],\n",
       "       [-0.33333333,  0.33333333,  0.33333333]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "I = np.eye(3)\n",
    "I - q1 @ q1.T- q2 @ q2.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "7b6f0d68-71fe-4378-a25d-830afee82749",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.11022302e-16]])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证，正交，\n",
    "q1.T @ q2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63132244-309f-457a-aa6a-bcc9797da92e",
   "metadata": {},
   "source": [
    "## 第三步"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "96ff9692-71a6-41b4-bf36-4b581d5f3dc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "eta_3 = a3 - q1 @ q1.T @ a3 - q2 @ q2.T @ a3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "da93199f-0cdb-4db5-bda8-9c8ab92d6d45",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-1.33333333],\n",
       "       [ 1.33333333],\n",
       "       [ 1.33333333]])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eta_3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "ab94536b-50b4-4837-91e3-0a290615a6db",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-0.57735027],\n",
       "       [ 0.57735027],\n",
       "       [ 0.57735027]])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q3 = eta_3/np.linalg.norm(eta_3)\n",
    "q3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "f7d56a5f-2afa-4a12-a18f-2f057de0fa58",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0.]])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证，正交，\n",
    "q1.T @ q3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "35df8506-8669-4f71-924f-53e3ed6adb49",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[-1.66533454e-16]])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 验证，正交，\n",
    "q2.T @ q3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "44d44800-598b-4f35-bc02-9d107aab4fa4",
   "metadata": {},
   "outputs": [],
   "source": [
    "a3_proj_q1_q2 = q1 @ q1.T @ a3 + q2 @ q2.T @ a3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "d26bb54b-8778-4d29-99ea-adaef3a01d38",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1.],\n",
       "       [1.],\n",
       "       [0.]])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a3_proj_q1 = q1 @ q1.T @ a3\n",
    "a3_proj_q1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "6b3dc935-74d2-43c2-84ba-e8f730797500",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0.33333333],\n",
       "       [-0.33333333],\n",
       "       [ 0.66666667]])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a3_proj_q2 = q2 @ q2.T @ a3\n",
    "a3_proj_q2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0dd7382f-08ca-471f-8d3d-f0609a1527aa",
   "metadata": {},
   "source": [
    "作者\t**生姜DrGinger**  \n",
    "脚本\t**生姜DrGinger**  \n",
    "视频\t**崔崔CuiCui**  \n",
    "开源资源\t[**GitHub**](https://github.com/Visualize-ML)  \n",
    "平台\t[**油管**](https://www.youtube.com/@DrGinger_Jiang)\t\t\n",
    "\t\t[**iris小课堂**](https://space.bilibili.com/3546865719052873)\t\t\n",
    "\t\t[**生姜DrGinger**](https://space.bilibili.com/513194466)  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:base] *",
   "language": "python",
   "name": "conda-base-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
